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Abstract 



Functional proteins must fold with some minimal stability to a struc- 
ture that can perform a biochemical task. Here we use a simple model to 
investigate the relationship between the stability requirement and the ca- 
pacity of a protein to evolve the function of binding to a ligand. Although 
our model contains no built-in tradeoff between stability and function, 
proteins evolved function more efficiently when the stability requirement 
was relaxed. Proteins with both high stability and high function evolved 
more efficiently when the stability requirement was gradually increased 
than when there was constant selection for high stability. These results 
show that in our model, the evolution of function is enhanced by allowing 
proteins to explore sequences corresponding to marginally stable struc- 
tures, and that it is easier to improve stability while maintaining high 
function than to improve function while maintaining high stability. Our 
model also demonstrates that even in the absence of a fundamental bio- 
physical tradeoff between stability and function, the speed with which 
function can evolve is limited by the stability requirement imposed on the 
protein. 



For nearly all proteins found in nature, there is a unique mapping from the 
linear protein sequence to a thcrmodynamically stable three-di mensional nativ e 
structure, with the mapping determined by the laws of physics (lAnfinsenl . ll97l . 
However, this unique mapping from sequence to conformation is not a general 
property of polypeptide sequences, since most randomly generated sequences 
do no t have stable folded structures (|Keefe and Szostakll200lt iDavidson et all 
119951) . In other words, natural protein sequences exist in the space of foldablc 
sequences, which is but a small subset of the space of all possible sequences. 
Therefore, evolution must have acted heavily on natural proteins in order to 
select those with stable native structures. 

Although natural proteins possess stable native structures, the evolutionary 
fitness of a protein depends not on the stability of the native structure per 
se, but rather on the stability of this structure being appropriate to allow the 
protein to perform a function such as catalyzing a chemical reaction or binding 
to a ligand. Stability is therefore under selection only insofar as it is necessary 
for biochemical function, and most natural pr oteins are onl y marginally stable 
at their physiologically relevant temperatures JFershtlEonl . 

In protein mutagenesis studies, stability and function can appear to be com- 
petin g properties, with mutations that increase stability often reducing func- 
tion |shoichet et all Il995l ISchreiber et aTLll994l). and mutatio ns that improve 
or alter function often decreasing stability JWang et all 120021) . However, sev- 
eral lines of evidence demonstrate that high stability and high functionality 
are not inherently incompatible. In Nature, there is a strong correlation be- 
tween the temperature of an organism's environment and the stability of its 
proteins, indicating that natural evolution is able to create functional and highly 
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stable proteins if there is sufficiently strong selection pressure ijSomerol Il995t 
iRees and Adamslll995|) . 

In the laboratory, protein engineers have also demonstrated that natural pro- 
teins are not maximally stable by using directed evolution to find m utations that 
make proteins more stable without sacrifi c ing enzymatic funct ion ( Giver et all 
Il998l lArnoldl ll99St ISerrano et all Il993t lArnold et all |2001[K These results 
show that high functionality and high stability can coexist, suggesting that the 
marginal stabilities of natural pro teins are due primarily to the simple fact that 
highly stable sequences are rare l|Taverna and Goldsteinl |2002|) . and therefore 
that most mutations to an evolved protein will decrease its stability. For this 
reason, proteins will tend to be no more stable than is required by their envi- 
ronment, since any extra stability that confers no further selective advantage 
will be eliminated by mutations. 

Comprehensive experimental examinations of protein evolution are limited 
by the vast number of possible sequences and the difficulties in rapidly assay- 
ing protein propertie s. However simple protein models originally developed to 

study protein fold i ng (iDill et al.l . ll995HHinds and Levitd . ll99llShakhnovich and Gutinl . 
Il993i ISocci et all Il998t) provide a useful tool f or studying protein evolution- 
ary dynamics l|Chan and Bornberg-Bauerl l2002|) . While these models are gross 
oversimplifications of real proteins, their tractability allows for a far more ex- 
tensive exploration of sequence space than can be done experimentally. Previ- 
ous st udies using model proteins have focused on the evolution of stable struc- 

tures dXia and Levitd.l2002tlCui et alll2002tlBa"stolla et allboOOHTaverna and Goldstein!. 

200(ilTiana" g^jjL . l2000tlBornberg-Bauer and Chanl . ll999l) or fas t-folding ilGutin and Shakhnovichl 
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. Il998tl proteins, while with few exceptions ( WilliamsetaTI 



Hirsll ll99*9n the interplay betweeen the evolution of stability and function 



has gone unexamined. Here we use a model protein to investigate how selec- 
tion for stability affects the evolution of function. In our model, we describe 
the function of a protein as its ability to bind to a rigid ligand molecule. The 
fitness of a protein depends on its ability to perform its function of binding to 
a ligand, which in turn depends on its ability to fold to a native structure with 
some minimal stability. We can increase the minimal stability requirement by 
increasing the temperature parameter, allowing us to explore the relationship 
between stability and the evolvability of function. 



Methods 

The Protein Model 

We use a highly simplified model of a protein consisting of a chain of N = 18 
monomers on a two-dimensional lattice that we allow to occupy any compact 
or noncompact conformation. 

The monomers can be of 20 types, corresponding to the 20 amino acids. 
Each monomer on the lattice has four nearest neighbor sites, of which as many 
as two can be occupied by non-bonded neighboring residues (three in the case 
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of terminal residues). The energy E (C) of a protein conformation C is the sum 
of the nearest-neighbor interactions of non-bonded residues, 

N i-2 

^(c)=EE c «( c ) xe W^i)' 
»=i j=i 

where G^ (C) equals one if residues i and j are nearest neighbors in conforma- 
tion C and zero otherwise, and e (_4 2 ; , Aj ) is the interaction energy between 
residue types Ai and Aj. The interaction energies e(Ai,Aj) are based on 
a widely used statistical analy sis of real proteins by Miyazawa and Jernigan 
l|Mivazawa and JerniganLll985j) fTable V). All energies are given in reduced units 
such that one energy unit equals fc^T at room temperature (298 K). Tempera- 
tures are given in units such that T = 1.0 at room temperature. 

Folding the Proteins 

The native structure and stability of the protein can be determined by finding 
the lowest energy conformation, Ci ow , and the partition function. Computation 
of the partition function requires defining a temperature parameter T. This 
temperature parameter represents the thermodynamic temperature, however 
since the model protein interaction energies are independent of temperature, 
the temperature parameter does not capture behaviors of real proteins that are 
caused by the temperature dependence of the interaction energies (for example, 
cold denaturation). In order to avoid confusion, we refer to T as the temperature 
parameter rather than as the temperature. 

The partition function at a temperature parameter of T is: 

Q(T) = ^cxp [-E(Ci)/T], 

where the sum is taken over all conformations {Ci}. The free energy of folding 
AG/ (T) to Ciow is then the difference between E (C\ ow ) and the free energy of 
the ensemble of all other conformations, 

AG/ (T) = E (Ciow) + T\n{Q (T) - exp [-E (C low ) /T]} . 

The fraction of proteins / (T) that are expected to be folded to Ci ow at equilib- 
rium is given by 

f (T) = . 

J V ' 1 + exp [AG/ (T) /T] 

Exact calculation of Q (T) requires enumeration of all 5.81 x 10 6 unique 
conformation s corresponding t o all of the self-avoiding walks that are not related 
by symmetry l)Rapaportlll987t) . Many of these walks have very few contacts, and 
so make only a small contribution to the partition function. We only explicitly 
considered the 7.95 x 10 5 conformations with more than four contacts. The 
remaining 5.01 x 10 6 conformations were treated by a crude mean-field model, 
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estimating the partition sum contribution of all conformations with n contacts 
(0 < n < 4) as 

Q n (T) = exp x cxp (^f^ x TV (n) 

where (e) is the average residue-residue contact energy for the given protein 
sequence assuming any residue is equally likely to be in contact with any other 
non-adjacent residue, of is the variance in the residue-residue contact energy, 
and M (n) is the number of conformations with n contacts. This approximation 
introduces only a very small error — a test of 10 3 random sequences at T = 
1.0 showed that the root-mean-square error and maximum differences between 
the approximate and exact values of AG/ (T) were 1.6 x 10~ 4 and 2.8 x 10~ 3 
respectively. This error had no effect on the evolutionary trajectories, since 
running a sample trajectory with and without the approximation led to identical 
results. Folding a protein took roughly 0.3 seconds on a 2 GHz processor. 

Modeling the Protein Function 

We introduce the concept of function by considering the binding of a ligand to 
the p rotein, an idea which to our knowledge was first introduced by l|Miller and Dil" 
119971) . We define the function of a protein as its ability to bind to a rigid lig- 
and when the protein is in its lowest energy conformation C\ ow . The binding 
energy BE(Ci ow ) is the interaction energy between the folded protein and the 
rigid ligand in the lowest energy binding position, found by searching all trans- 
lational and rotational positions of the ligand relative to the protein. Figure ^ 
shows the ligand used in the simulations bound to a protein in its lowest energy 
conformation. 

The Fitness Function 

The fitness T (T) of a protein at a temperature parameter of T is defined as the 
negative product of the fraction of the proteins that are folded / (T) times the 
binding energy BE (Ci ow ) of the folded protein to the ligand, so that 

T{T) = —f (T) x BE (Ci ow ) 

= ~l + eX p[AG f (T)/T\ xBE ( Cl ™)- 

Since T (T) has a sigmoidal dependence on AG/ (T), the fitness of an un- 
structured protein is essentially zero, and a protein gains fitness as it achieves 
some minimal stability determined by the temperature parameter. Once a pro- 
tein has achieved the minimal stability, it can only substantially improve its 
fitness by improving its ligand binding function. The stringency of the stability 
requirement depends on the temperature parameter T at which the fitness is 
computed, with higher temperature parameters favoring greater stabilities. 
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Strictly speaking, the fraction of proteins bound to the ligand also displays 
a sigmoidal dependence on the product of the ligand binding energy and the 
fraction of folded proteins. However, since we are only interested in differences 
in fitnesses rather than their magnitudes, any function that monotonically in- 
creases with this product will give the same results, and so we choose the simpler 
functional form defined above. 

Evolving the Proteins 

Each evolutionary replicate began with a population of 99 random sequences. 
At each generation, the 33 most fit sequences were selected, and each was used 
to generate two identical offspring. Random point mutations were made in all 99 
resulting proteins with a per-site mutation rate of 3.3 x 10 -2 , which corresponds 
to a per-protein mutation rate of 0.6. The mutated proteins were then re-folded, 
and their fitnesses were calculated. 

For the evolutionary trajectories that began with evolved proteins, we first 
evolved random populations for 250 generations to bind to each of the lig- 
ands shown in Figure B The best binding sequences for ligands one, two, and 
three were FFKFKKFKIFMLK WMKMF , FMGFMIIFFLKFKKFGWF, and 
MFHVFCHFEWPKPMKCFM respectively. These sequences were then used 
for the initial identical populations of 99 proteins for the evolutionary runs, 
which were otherwise carried out as before. 

Results 

Rapid Evolution of Fitness at Different Stabilities 

We carried out 50 evolutionary replicates each beginning with a different initial 
random population at five different reduced temperature parameters, T = 0.8, 
0.9, 1.0, 1.1, and 1.2. All replicates exhibited similar evolutionary trajectories, 
with a rapid gain of fitness in the first few hundred generations followed by 
only small subsequent increases in fitness. Figure D shows two typical replicates 
at a temperature parameter of 1.0. Both the stability and the binding energy 
improved over time, with improvements in binding energy usually associated 
with temporary small decreases in stability. 

Figure D also shows the lowest energy conformations at four different points 
in the evolutionary trajectory for the two proteins. The overall structures of the 
proteins were highly conserved, and the stabilities and binding energies adjusted 
primarily by changes in sequence that preserved these basic structures. This 
behavior is consistent with the evolution of real proteins, which is believed to 
involve changes in sequence that conserve the structural scaffold of the protein. 
The ligand binding arrangement shown in Figurcfjis typical of that of an evolved 
protein. The most fit proteins usually evolved to fold to compact conformations 
with a binding region that fits into the cavity of the ligand. 
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Proteins Evolve Ligand Binding Function More Efficiently 
at Lower Stability 

In order to determine how selection for stability affected the evolution of ligand 
binding function, we examined the binding energies achieved after 500 genera- 
tions of evolution at all 5 temperature parameters. Figure D shows the distribu- 
tion of binding energies for all cases. At least a few replicates evolved strong 
binding proteins at all of the temperature parameters. However, the frequency 
of evolution of strong binders was much higher at lower temperature parameters, 
whereas at higher temperature parameters many of the evolutionary trajecto- 
ries became stuck at weak binding proteins. The binding energy distributions 
are statistically different for temperature parameters that varied by more than 
0.2 with confidences of greater than 0.95 (Kolmogorov-Smirnov test, D and P 
values for comparison of T = 0.8 and T = 1.0, T = 0.8 and 1.1, T = 0.8 and 
1.2, T = 0.9 and T = 1.1, T = 0.9 and 1.2, and T = 1.0 and T = 1.2 arc 0.64 
and 7.8 x 10~ 10 , 0.58 and 3.7 x 10" 8 , 0.68 and 4.8 x 10" 11 , 0.34 and 4.4 x 1Q- 2 , 
0.60 and 1.08 x 10~ 8 , and 0.52 and 1.2 x 10" 6 respectively ((Press et all 12002^ . 



Table ^ shows the mean binding energies, stabilities, and fitnesses evolved 
by the proteins. At higher temperature parameters, the proteins evolved higher 
stability but weaker binding. These results indicate that strong selection for 
stability inhibits the evolution of strong binding. The evolution of a few strong 
binders at high temperature parameters shows that it is fundamentally pos- 
sible to evolve good binding under strong selection for stability, however the 
results clearly indicate that the statistical likelihood of evolving strong binding 
is decreased by increasing the selection for stability. 

Low Stability Evolution as a Route to High Stability Fitness 

Since sequences tend to evolve stronger binding at lower temperature parame- 
ters, we speculated that it might be possible to evolve high stability and strong 
binding proteins more efficiently by performing the initial generations at low 
temperature parameters. This approach is analogous to simulated annealing, 
except that in this case the temperature parameter is being increased, since 
performing the initial generations at low a temperature parameter helps the 
proteins escape weak binding traps. We tested this idea by performing 50 repli- 
cates in which the temperature parameter was set at 0.8 for the first 200 steps, 
then increased in a linear gradient from 0.8 to 1.2 for 200 steps, and then kept at 
1.2 for the final 100 steps. We compared the average final fitness of the proteins 
after this 500 generation gradient to the final fitness after 500 generations at 
a constantly high temperature parameter of 1.2 to determine which approach 
better allows the proteins to optimize the combination of stability and ligand 
binding function. Figurc B shows that the proteins tended to evolve higher fitness 
with the gradient than with the constantly high temperature parameter. The 
two distributions are statistically differe nt with high confid ence (Kolmogorov- 
Smirnov test, D = 0.40, P = 4.2x 10" 4 ((Press et all 12002^ ) . 

The gradient approach is more efficient at evolving high fitness because it 
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prevents the proteins from becoming trapped in regions of high stability but 
weak binding by allowing them to first evolve strong binding and then improve 
their stability. This can be seen in the two typical trajectories shown in Fig- 
ure B The immediate selection for high stability of the constant temperature 
parameter approach locks the proteins into stable structures that cannot easily 
improve their binding energy, whereas gradual selection for stability allows the 
proteins to first achieve strong binding and then optimize their stability. These 
results indicate that it is easier to improve stability while maintaining strong 
binding than it is to improve binding while maintaining high stability. 

Evolution from Different Initial Proteins 

The results we have described thus far are from evolutionary trajectories that 
began with random protein sequences. Biological and laboratory protein evo- 
lution do not start with random sequences, but instead modify the properties 
of existing proteins. To test whether the trends we observed depend on the 
initial populations, we repeated our experiments beginning with proteins that 
had been evolved to bind to different ligands by first evolving random protein 
populations for 250 generations to bind to the three ligands shown in Figure 
B We then used the most fit protein from each of these runs as the beginning 
sequence for 50 runs of evolution for binding to the original ligand used above 
(Figure 0}. 

All of the trends we found by beginning with random populations were pre- 
served when we started from these evolved proteins. We again found that evo- 
lutionary trajectories at lower temperature parameters yielded stronger bind- 
ing final proteins, and that the gradient was more effective at evolving high 
temperature fitness than a constant high temperature parameter (Figure^. The 
different initial populations do lead to different final binding energies and fit- 
nesses, with random populations tending to lead to better values. However, 
the trends of lower temperature parameter leading to stronger binding and a 
gradient approach leading to higher fitness hold for all four initial populations. 

Discussion 

We have shown that strong selection for stability inhibits the evolution of lig- 
and binding function by the proteins in our model. The ability of a few pro- 
teins to evolve strong binding at high temperature parameters shows that there 
are sequences that exhibit both good stability and strong binding. However, 
at high temperature parameters, the evolving proteins are more likely to be- 
come trapped in regions of sequence space that correspond to highly stable but 
weakly binding proteins. Presumably this trapping is due to that fact that 
stronger selection for stability reduces the network of sequences that are es- 
sentially neu tral since there are fewer highly stable seque nces compatible with 
a given fold l|Shakhnovichl ll99St iKoehl and Levitt! l2002j) . Therefore, at high 
temperature parameters, mutations that increase binding are more likely to lead 
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to an unacceptably large drop in stability. The net effect is a roughening of the 
fitness landscape that makes it more difficult to escape local optima. 

We present a strategy to overcome this problem of the evolutionary trajecto- 
ries becoming trapped at high stability but weak binding proteins. Performing 
the initial rounds of evolution at low a temperature parameter decreases the se- 
lection for stability, and so allows the proteins to more easily find strong binding 
regions of sequence space. The temperature parameter can then be increased, 
which leads to the selection of more stable sequences. Our results indicate 
that this approach is more effective for evolving highly stable and strong bind- 
ing proteins than constant selection for both high stability and strong binding. 
This strategy takes advantage of the fact that it is easier to maintain strong 
ligand binding while improving stability than to maintain high stability while 
improving binding. 

Our results fit into the framework of current theories about the distribu- 
tions of proteins in sequence space that has emerged from other lattice pro- 
tein studies. These studies have shown that protein structures are coded for 
by structurally neutral networks spanning many diverse sequences, and that 
these networks are structured as superfunncls, wit h the most stable sequences 
also posessing the most connections in the networks feornberg-Bauer and Chanl . 
Il99a iBroelia et all. Il999t iBastolla et all l2000t Il999|) . Our work suggests that 
a protein evolves function most effectively when it can freely explore in its 
structurally neutral network, rather then when it is trapped in a small number 
of highly stable sequences. Our initial relaxation of the stability requirement 
facilitates exploration of the structurally neutral network, and once highly func- 
tional sequences are found, they can be optimized for stability. Although we do 
not consider recombin ation in our current study, other work ( Cui et all l2002t 
IXia and Levitt! l2002fl has shown that while structurally neutral networks can 
easily be explored locally by point mutations, moves between networks or to dis- 
tant regions of the same network are factilited by crossover-induced sequence 
space jumps. Therefore, we suggest that addition of recombination to our evo- 
lutionary protocol may further assist in the evolution of function. 

The evolution of our model proteins also has strong parallels with real pro- 
tein evolution. As with real proteins, our model proteins evolve primarily by 
structurally conservative mutations that tinker with the contacts in a preserved 
structural scaffold, rather than by mutations that cause wholesale structural 
changes. The interplay between the evolution of stability and function in our 
model is also reminiscent of real protein evolution; for example, in the evolution 
of new function in TEM-1 /3-lactamasc, gains in function were co rrelated with 
drops in stability, followed by gradual regaining of the lost stability l|Wang et all 
120021) . 

Our model points to general trends that are important in both natural and 
experimental protein evolution, where different structural and functional prop- 
erties are under different selection pressures. Protein evolution involves con- 
current selection for stability and function, and productive mutations must im- 
prove one of these properties without excessively damaging the other. Since 
most mutations to evolved proteins will be deleterious to at least one of these 
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properties, strong selection for both stability and function will limit the number 
of productive mutations, and so lead to trapping at local fitness optima. Protein 
evolution therefore occurs most efficiently when the temporary drops in stability 
associated with gains in function arc buffered by mild selection for stability. 
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Temperature 


(BE) 


(AG/| T=1 . n ) 


(3~ T=l.o) 


0.8 


-18.47 ± 0.32 


-1.60 ± 0.06 


14.75 ± 0.29 


0.9 


-16.99 ± 0.37 


-1.95 ± 0.07 


14.57 ± 0.32 


1.0 


-15.78 ± 0.37 


-2.29 ± 0.07 


14.18 ± 0.32 


1.1 


-15.08 ± 0.42 


-2.65 ± 0.09 


13.99 ± 0.37 


1.2 


-13.78 ± 0.38 


-2.82 ± 0.09 


12.97 ± 0.32 



Table 1: Average Binding Energies, Stabilities, and Fitnesses after 500 
generations. The stabilities and fitnesses are computed at a reference tempera- 
ture parameter of 1.0 to allow comparison. Means are shown plus/minus their 
standard errors. Values are averages are of the most fit member of all 50 popu- 
lations. 

Figure 1: Lowest energy conformation of a protein (at left) bound to the rigid 
ligand used in the simulations (at right, shown in bold) in the lowest energy 
binding position. The stability of the protein at a temperature parameter of 
T = 1.0 is AG/ = —1.04, and the binding energy of the protein to the ligand is 
BE = -17.90. 

Figure 2: Two typical replicates performed at a temperature parameter of 1.0. 
The plots at the top show the evolution of fitness, of stability (solid lines), 
and binding energy (dotted lines) of the most fit member of the population. 
Structures at bottom are of the most fit sequence at 10, 170, 330, and 500 
generations. 

Figure 3: Ligand binding function evolves more efficiently at lower temperature 
parameters. The histogram shows the distribution of the best binding ener- 
gies after 500 generations of evolution for all 50 replicates at each temperature 
parameter. Binding energies are of the most fit member of the population. 

Figure 4: High temperature parameter fitness evolves more efficiently with a 
gradient than with constant selection at a high temperature parameter. The 
histogram shows the distribution of fitnesses after 500 generations for evolution 
at a constant temperature parameter of 1.2 and evolution with the gradient 
from 0.8 to 1.2. Fitnesses are of the most fit member of the population. 

Figure 5: Evolution occurs more efficiently when proteins can first evolve strong 
ligand binding function at low temperature parameters. Solid lines show the 
stability-function trajectories with a gradient from 0.8 to 1.2, and dotted lines 
show trajectories with a constant high temperature parameter of 1.2. At left the 
final fitnesses are 14.01 for the gradient and 10.52 for the constant temperature 
parameter; at right, the values are 13.92 and 10.60 respectively. Plots are for the 
most fit member of the population sampled every 10 generations. Stabilities are 
computed at a reference temperature parameter of T = 1.0 to allow comparison. 
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Figure 6: Proteins were evolved for 250 generations to bind to these ligands. 
We then modified the function of these evolved proteins by evolving them to 
bind to the original ligand shown in Figure ^ 



Figure 7: Lower temperature parameter evolution leads to stronger binding, 
and a gradient approach leads to better high-temperature parameter fitness 
regardless of the initial population. At left arc the binding energies for evolution 
at the indicated temperature parameter with the indicated initial populations. 
At right are the fitnesses at a temperature parameter of 1.2 with evolution at a 
constant temperature parameter or with a gradient. Values are the mean and 
standard error for the most fit member of the population for all 50 runs after 
500 generations. 
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